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Abstract: We consider the Hopfield model with n neurons and an increasing number 
p = p(n) of randomly chosen patterns and use Stein's method to obtain rates of 
convergence for the central limit theorem of overlap parameters, which holds for every 
fixed choice of the overlap parameter for almost all realisations of the random patterns. 
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1. Introduction 

1.1 The Hopfield model The so-called Hopfield model was introduced by Figotin and Pastur 
in [T5] and [TB] as a model for a spin glass. They studied a class of spin glass models which 
also included the one with the energy function known today as the Hopfield model, which 
was also introduced by Hopfield in [2] in the context of neural networks as a model for an 
associative memory with n6N neurons. Thus Hopfield linked the study of neural networks to 
the one of spin models. The success of this model was mainly based on this reinterpretation of 
the model and therefore it may be right to call it the Hopfield model. Being a model for the 
associate (also termed content-addressable) memory it is not derived directly from a physical 
or biological system. Roughly speaking, the recognition and/or retrieval of one out of p G N 
stored patterns constitutes the central problem of the model. This means that one wants to 
store a certain amount of information and perform the quite difficult task to recognize it on 
the basis of partial or corrupted data, which is not easy for a usual search algorithm. 
We consider a system of n G N neurons. Each neuron can be in one of two possible states, either 
— 1 or 1. We will denote by <Ji G { — 1, 1} the neural activity of the i th neuron, i G {1, . . . ,n} 
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and thus, in the context of spin systems, G{ would be the spin variable at % G {!,..., n}. 
Thus a spin configuration (cii, . . . , o~ n ) is taken from the set of spin configurations { — 1, l} ra . In 
general the instantaneous configuration of all the spin variables at a given time describes the 
state of such a network. Furthermore let (Q, £>, P) be an abstract probability space. The model 
consists of p G N stored patterns on this space which will be denoted by \i G {1, . . . ,p}. 
Thus £ M = (£1, . . . , G { — 1, l} n describes the codification of the /z th stored pattern. (ai) ie ^ 
and (^f )ieN with \i G N are considered to be random variables and we will assume that the 
family of random variables | i,j,fi G N} is independent. Additionally we assume that 

the random variables satisfy P(<jj = ±1) = 1/2 and P(£^ = ±1) = 1/2. Thus we denote by 
P f = + |5a)® N2 the marginal distribution of the patterns £ = (£•* )i iM gN ; an d similarly, by 

Pa = (§<^-i + |<5i)® N the marginal distribution of the spin variables o = (oi)ieN- As n — > oo p 
can either be fixed or increasing with n. Now let 

i V n 

HnM) = -n"E E tftf<^, n G N, (1.1) 

denote the Hopfield Hamiltonian. At this point one might notice the spin-flip dynamic 
H n (—a,£) = H n (a, £), showing that the Hopfield model cannot distinguish between a spin 
configuration and its negative. Governed by this Hamiltonian, [TJ presented a generalized Glau- 
ber single-spin dynamics on the set of spin configurations at finite temperature 1//3 G (0, oo), 
which describes a reversible and irreducible Markov process. The equilibrium distribution of 
this process is the finite- volume Gibbs measure 

diW<r) = -^exp(-/3if n (a,0)dP (T , (1.2) 

where the partition function Z n) p } £ is the appropriate normalization. 

In the sequel the focus of attention will be on the investigation of the behavior of the so-called 
overlap under the equilibrium distribution P n> p£ as n — > oo. Let 

& = (£l%e{i,..., P }, ie{l,...,n}, (1.3) 

be the vector consisting of the i th components of the first p patterns. If p is not constant and 
grows with n, & G MP still depends on n via the dimension. We define the overlap by 

1 1 n 

-S n (a,0 = -E^ eKP > (I- 4 ) 
n n t-f 

with £j(Xi = (£jVj, . . . , CffTj)*. With the overlap we obtain a comparison between the spin confi- 
guration a and the stored patterns £ M , \i G {1, . . . meaning that the /i th overlap parameter 
- the /i th component of (II. 4p - equals one if and only if G{ = £f for all % G {1, . . . , n}. Definition 
(jl.4j) provides the opportunity to express the Hamiltonian (II .ip in a more convenient way. It 
can be rewritten as the quadratic function of the overlap 
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where || ■ || denotes the Euclidean norm in MP. If there is no opportunity for confusion we will 
drop the explicit dependence on a and £ and write S n and H n instead of S n (cr,£) and H n (a,£), 
respectively. 

In the case p — 1 the Hopfield model and the Curie- Weiss model are the same apart from a 
change of variable. The Curie- Weiss model is a well-known approximation of the Ising-model. 
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The classical theory of magnetism occupies a central place in the physical literature. It allows 
the study of the behavior of thermodynamic quantities such as the specific heat, isothermal 
susceptibility, and magnetization in the neighborhood of the critical point. Because of its re- 
lative simplicity and the qualitative correctness of at least some of its predictions, it has been 
historically important. For our investigation of the Hopfield model we focus on the so called 
Curie- Weiss equation given by 

(3x = arctanh(x). (1.5) 

This equation is also called mean field or fixed point equation. Its derivation can for example 
be found in [TU]. Of course this equation may have many solutions. Let x ± (/3) denote for 
(3 > the largest (respectively smallest) solution x G (—1,1) of (II. 5p . It was shown that 
x + {(3) = — x~(j3) 7^ for > (3 C , where j3 c = 1 is the critical inverse temperature. For (3 < (3 C 
we have x ± (f3) = 0. This definition of the Curie- Weiss equation can be extended to the case of 
the external magnetic field with strength h ^ yielding 

(3x + h = arctanh(x). (1-6) 

Here let x((3,h) denote the solution of (jl.fip which satisfies sign(x) = sign(/i). As we will 
see these solutions of the Curie- Weiss equation discussed above play an important role when 
discussing the Hopfield model. Abbreviate 



x 



x+(/3), if h=0, 
x(/3,h), otherwise. 



For investigating the behaviour of the overlap, we also extend the notion of the Gibbs measure 
P n ,i3£ given in (11.21) to the case of an external magnetic field hei with strength h ^ in the 
direction of the I th unit vector q G M. p . Thus, let 

dP n , PMh t(a) = — — exp {-(3H n + (S n , he t )) dP CT , (1.7) 

where Z n> p )hei £ denotes the appropriate normalization. 

For (3 > and h ^ having the direction of the I th unit vector ei it was shown in [4] that for 
P^-almost all realizations of the patterns £ and if p/n — > the overlap — satisfies the law of 
large numbers 

The authors in [4] stated that the condition on p is the weakest possible under which the law 
of large numbers is satisfied. Note that for (3 < (3 C = 1 we have x((3, h) = and thus So is 
the unique limiting measure in the high-temperature region. For /3 > 1 it was mentioned that 
the measures of the law of large numbers are all distinct and they were referred to as so-called 

extremal measures. 

The corresponding large deviation principle (LDP for short) was established in [2]. Under 
the assumption p{n)/n — > for almost all £ the sequence (— ) n under the Gibbs measure P n ,p£ 
obeys a LDP with speed n and deterministic rate function /. If the inverse temperature (3 is 
different from the critical inverse temperature (3 C = 1 and p(n)/n — > oo, the overlap parameter 
multiplied by n 1 with 1/2 < 7 < 1 obeys a LDP with speed n 1-7 and a quadratic rate function, 
see [7J. The latter result is known as a moderate deviations principle (MDP for short). 
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On the scale of fluctuations, when analysing the distribution of y/n(S n /n — x*ei), the disorder 
becomes visible. Indeed, for p(n)/n — > and (f3,h) ^ (1,0) the overlap under P n ,p,£, satisfies 
P^-almost surely a central limit theorem with a covariance matrix which could be expected from 
the analogy with the Curie- Weiss model and a centering which differs in the case (3 > or h ^ 
from the naively expected one by a ^-dependent adjustment, see [Uj and [3j. In this paper we 
are aiming to give an alternative proof of these central limit theorems for the overlap parameter 
under P n ,p,e,- We will apply Stein's method. This method has emerged as a powerful tool for 
assessing the quality of distributional approximations and it is notable for avoiding the use of 
transforms, and for supplying bounds, such as those of Berry- Esseen quality, on approximation 
error in the presence of dependence. We will be able to present rates of convergence for central 
limit theorems for the overlap parameter, which are optimal for the Hopfield model with a finite 
number of randomly chosen patterns. As in the Curie- Weiss model at the critical temperature 
((3, h) = (1, 0) the fluctuations are non Gaussian and the limiting distribution has a random 
component, see [13] and [23] • Interesting enough the random term occurring in the central limit 
theorem is no longer present on a moderate deviations scale, where the overlap parameter has 
to be multiplied be n 7 with 1/4 < 7 < 1: here for certain choices of p(n) the rescaled overlap 
parameter obeys a MDP with speed n 1-47 and a rate function that is basically a fourth power, 
see [7]. Anyhow, in this paper we do not consider the case ((3, h) = (1,0). 

1.2 Statement of the main results 

General assumption. From now on we make the assumption that p = p(n), p < n is a 
nondecreasing function of n for all nGN. 

As in [12] we choose a preferred pattern in two different ways. We consider the unbiased 
Hamiltonian (11.11) and investigate the fluctuations under the condition that the overlap is 
already in a neighbourhood of x*e\. Alternatively, the preferred pattern can be chosen by 
introducing the magnetic field as in (11.71) . In the case of (11.11) with (3 < (3 C the central limit 
theorem holds with center zero. Otherwise the limit theorem requires a ^-dependent adjustment 
of a deterministic centering. Therefore one has to control the influence of the random patterns. 
For fixed e > we define 

1 / ( 31ogn \*\ 
e n := y/^(2 + y/a)(l + e). (1.8) 

n 

By [12, Proposition 2.1] we see that the operator norm of £ n (£) = - J2 ~ Idw converges to 

i=i 

zero for Pg-almost all £: for P^-almost all £, there exists an no(0 eN such that for all n > rio(C) 

||E"(0||<e n . (1.9) 

The following index set depends on the dimension p, on the inverse temperature /3, the presence 
or absence of an external magnetic field h and its direction e\. 

!{sign(/i)Z}, in the case h 7^ 0, 

{1}, in the case < (3 < (3 C and h = 0, (1.10) 

{— p, . . . , —1, 1, . . . ,p}, in the case (3 > (3 C and h = 0. 

The index set L is used to describe those directions that the overlap favors under the equilibrium 
measure. In (/3 C , 0) the central limit theorem fails (see [12]). Thus we do not need L for these 
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parameters. The following result is proved in [T2"| Proposition 2.3] and is an important step for 
defining the centering. 

Proposition 1.1. 

Let f3 > and h > such that (f3, h) ^ (f3 c , 0) and I E {-p, . . . , -1, 1, . . . ,p}. For A E MP, we 
define the £- dependent function 

*(A) : = ~ ||A - he l f + logcosh(A,^}. (1.11) 

P n j = i 

Then, for all strictly positive c\ < (1 — (3(1 — (x*) 2 ))/ (3, there exists an r± > 0, depending on 
(3, h and c\ only, and for P^- almost all^, there exists ann\(£) > n (£), which does not depend 
on the choice of I, such that for all n > ni(£) the following assertions hold: 



(1) For all A in the closed ball B ri (arctanh(x*ei)), the matrix — D 2 $(A) is uniformly positive 
definite in the sense that 

Ci\\u\\ for all u E W . 



(2) On the set -B n (arctanh(x*e;)) ; the map $ has a unique maximum which is attained in 
the point Af (£) satisfying 

|A™(£) — arctanh(x*ej)| < c 2 e n 

with C2 = 2\x\/ci. In particular, A™(£) = in the case (3 < (3 C and h = 0. 

Remark 1.2. The function $ defined in fll.lip is sometimes called quenched free-energy of the 
Hopfield model. If the realizations £i, . . . ,£ n take all possible values with the same frequency 
and n is a multiple of 2 P , then Xf (£) = arctanh(a;*e;). 

The random centering is given by 

= ^(m) ~ he<) (1.12) 

with the help of A"(£) for / E {—p, ■ ■ ■ , —1, 1, . . . ,p}. Even if it is not indicated by the name 
it remains important to notice that (11.121) still depends on /3 and h. We have to extend this 
definition because (11.121) is only defined for P^-almost all £ and n > ni(£). We assign 

x ?(0 = -^(arctanhx* — h)ei = x*ei (1-13) 

whenever A™(£) is not defined. The second equality of (jl,13p is due to the Curie- Weiss equation 
(11.61) . Using Proposition 11.11 we see that for (3 < (3 C the centering satisfies £"(£) = 0, while for 
(3 > f3 c the centering is close to the limiting point x* in the sense that 

\\x?(0 - x*e t \\ < ic 2 e n ^0 (1.14) 

as n — > oo for some constant C and e n defined in ( II. 8p . 

From now on we will write random vectors in M. d in the form w = (wi, . . . , w^Y, where Wi 
are IR-valued variables for i = 1, . . . , d. If a matrix S is symmetric, nonnegative definite, we 
denote by S 1//2 the unique symmetric, nonnegative definite square root of S. Id denotes the 
identity matrix and from now on Z will denote a random vector having standard multivariate 
normal distribution. The expectation with respect to the measure P n ,p,hei£ w * n be denoted by 
E := E P 
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Let 7tk : M. p — > M. k (with k < p) denote the canonical projection. 
Theorem 1.3. 

Let /3, h > 0, I G Z, / 7^ 0, and fcGN. We assume that p depends on n in a nondecreasing way 
satisfying p < n. Let x = xf(£) be defined as in (II. 121) and W be the following random variable: 

W := \frm k (— - x 
\ n 

If Z has the k-dimensional standard normal distribution, under the measure P n ,p,he h e,, we have, 
for every three times differentiable function g and F^-almost all 

Eg(W) - Eg (y}' 2 z) \ < Cmax j Pv ^e n , j , 

for a constant C and S := E [W W l \. 

Remark 1.4. The rate of convergence obtained here is useless unless 

max jpv/pe™, j 0. (1.15) 

In [21 Theorem 1.1] the authors proved that the condition p/n — > is sufficient in order to 
state the central limit theorem and show the weak convergence. In [T2] and [5] there is no 
information available on the speed of convergence. Obviously fl 1 . 1 5 1) is poorer but we do not 
need any conditions on p in advance. Our theorem implies weak convergence. 

In order to state a result for non-smooth test functions g in the multivariate setting, we 
introduce a class of test functions Q following [TU]. Let again $ denote the standard normal 
distribution function in M. d . We define for 

gj(x) = sup{g(x + y) : \y\ < 5}, (1.16) 
gj(x) = int{g(x + y) : \y\ < 6}, (1.17) 
~g(x,5)=g+(x)-gi(x). (1.18) 

Let Q be a class of real measurable functions on M. d such that 

(1) The functions g G Q are uniformly bounded in absolute value by a constant, which we 
take to be 1 without loss of generality. 

(2) For any d x d matrix A and any vector b G M d , g(Ax + b) G Q. 

(3) For any 5 > and any g G Q, g£(x) and gj(x) are in Q. 

(4) For some constant a = a(Q, d), sup < / g(x, 8)&(dx) > < aS. 

Obviously we may assume a > 1. Considering the one dimensional case, we notice that the 
collection of indicators of all half lin es a nd indicators of a ll intervals form classes in Q that 
satisfy these conditions with a = J^/ir and a = respectively. This was shown for 

example in [18] . In dimension d > 1 the class of indicators of convex sets is known to be such a 
class. Using this class of functions we are able to present rates of convergence for non-smooth 
test functions. 
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Theorem 1.5. 

Let (3, h > 0, I £ Z ; (7^0) and fceN. We assume that p depends on n in a nondecreasing way 
satisfying p < n. Let x = be defined as in (jl,12p and W be as in Theorem ] 1.31 If Z has 

the k-dimensional standard normal distribution, under the measure P n ,p,heu£> we have, for all 
g G Q with \g\ < 1 and F^-almost all £ 7 

Eg(W) - Eg (s 1/2 z) < C log(n) max \p^/pe n , V ' 



1/2 



n 

for a constant C and £ := E [W W% 

In the case where p is fixed the rate gets much simpler since we do not need the projection 
in order to reduce the size of the vector W. 

Theorem 1.6. 

Let (3, h > 0, I G Z and I ^ 0. We assume that p is fixed. Let x = x?(£) be defined as in (11.121) 
and W be the following random variable: 

W := yfn (— - x 

\ n 

If Z has the p-dimensional standard normal distribution, under the measure P n ,p,hei£> we have, 
for every three times differentiable function g and F^-almost all £, 

Eg(W) - Eg (s 1/2 z) | < CV^ 1/2 , 

for a constant C and S := E [W W l \ . 

With the same techniques necessary to prove Theorem 11.51 we get a theorem similar to 
Theorem 11.61 with rate log(n)n~ 1//2 . 

When there is no external field it is natural to ask for the fluctuations of the overlap around 
x*ei. With L as in (11.101) we determine the conditional fluctuations and a rate of convergence: 

Theorem 1.7. 

Let (3 > 0, f3 f3 c , h = 0, I £ L and k 6 N. We assume that p depends on n in a nondecreasing 
way satisfying p < n. Let x = xf(£) be defined as in (11.121) and W be as in Theorem \l.'J[ Then, 
if Z has the k-dimensional standard normal distribution, under the conditional measure 



we have for every three times differentiable function g and Ft-almost all £ ; 



— eB(x*e h e) 
n 



Eg(W) - Eg {t}' 2 z) \ < Cmax \p^e n , j , 
for a constant C and S := E [W W l \ . 

Note that also for the case of h = a theorem for non-smooth test functions could be stated 



similar to Theorem 11.51 and additionally we obtain a theorem if p is fixed with rate n~ l l 2 in 
the same way as in Theorem 11.61 
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In Section 2 of the present paper, we introduce Stein's method and present two plug-in 
theorems for multivariate normal approximation. Section 3 contains some auxiliary results 
which will be necessary for the proofs given in Section 4. 

2. Stein's method of exchangeable pairs 

Starting with a bound for the distance between univariate random variables and the normal 
distribution Stein's method was first published in [20J (1972). In [2T] Stein introduced his 
exchangeable pair approach. At the heart of the method is a coupling of a random variable W 
with another random variable W such that (W, W) is exchangeable, i.e. their joint distribution 
is symmetric. Stein proved further on that a measure of proximity of W to normality may be 
provided by the exchangeable pair if W' — W is sufficiently small. He assumed the property 
that there is a number A > such that the expectation of W — W with respect to W satisfies 

E[W - W\W] = -XW. 

Heuristically, this condition can be understood as a linear regression condition: if (W, W) were 
bivariate normal with correlation g, then E[W|W] = gW and the condition would be satisfied 
with A = 1 — g. Stein proved that for any uniformly Lipschitz function h 

\Eh(W) - Eh(Z)\ < 5\\h'\\ 

with Z denoting a standard normally distributed random variable and 



5 = 4E 



2A 



(W - W) 2 \W 



+ —e\w-w 

2A 



/|3 



Stein's approach has been successfully applied in many models, see e.g. [2T] or [22] and references 
therein. In [TS] the range of application was extended by replacing the linear regression property 
by a weaker condition assuming that there is also a random variable R = R(W) such that 

E[W' -W\W] = -XW + R. 

While the approach has proved successful also in non- normal contexts (see [5], [6] and [Sj) it 
remained restricted to the one-dimensional setting for a long time. Applying the linear regression 
heuristic in the multivariate case leads to a new condition due to [T7] : 

E[W' - W\W] = —AW + R (2.1) 

for an invertible dxd matrix A and a remainder term R = R(W). Different exchangeable pairs, 
obviously, will yield different A and R. 

The theorems for smooth test functions are based on a nonsingular multivariate normal ap- 
proximation theorem taken from [T7] . To present this theorem we fix some more notations. The 
transpose of the inverse of a matrix will be presented in the form A~ l := (A -1 )*. Furthermore 
we will need the supremum norm, denoted by || • || for both functions and matrices. For deriva- 
tives of smooth functions / : lR d — > K, we use the notation V for the gradient operator. For a 
function / : M d — > R, we abbreviate 







|/|i := sup — / , |/| 2 := sup 5—5— / 



D 2 



and so on, if these derivatives exist. 
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Theorem 2.1. (Reinert, Rollin: 2009) 

Assume that (W, W) is an exchangeable pair of W 1 -valued random vectors such that 

E[W] = 0, E[W W % ] = E, 

with £ G IR dxa! symmetric and positive definite. If (W, W) satisfies fl2.ll) for an invertible 
matrix A and a a (W) -measurable random vector R and if Z has d-dimensional standard normal 
distribution, we have for every three times differentiable function g, 



Eg(W) - Eg (x^z) < ^A + l -^B + (j^ + l -d\\n\ l ' 2 \g\^ C, 



12 



where, with A (i) := £ |(A -1 ) 

m=l 



A = J2 A (i yv[E[(W i ' - Wi){W< - Wj) | W] 



(2.2) 



5 



E x^E\(w;-w i )(w;-w j )(w; c -w k ) 

i,j,k=l 
d 



c = ea w v v [^]- 

i=l 

The advantage of Stein's method is that the bounds to a multivariate normal distribution 
reduce to the computation of, or bounds on, low order moments, here bounds on the absolute 
third moments, on a conditional variance and on the variance of the remainder term. Such 
variance computations may be difficult, but we will get rates of convergence at the same time. 
In the same context as in [T7] the authors in [9] proved the following theorem, presenting bounds 
for non smooth test functions. Their development differs from [T7] using the relationship to the 
bounds in [TS1. 



Theorem 2.2. 

Let (W, W) be an exchangeable pair with E[W] = andElWW 1 ] = S with £ G R dxd symmetric 
and positive definite. Again we assume that (W, W) satisfies (12.11) for an invertible matrix A 
and a o~(W) -measurable random vector R and additionally, for i G {1, . . . , d}, \W' i — W%\ < A. 
Then, 

SMV\Eg(W)-Eg(Y}l 2 Z)\ < c\\og(r 1 )A 1 + (log^ 1 )!!^ 1 / 2 + l) A 2 

9&Q 



+ 1 + Mr 1 ) J2E\Wi\ + a) A 3 A 3 + a A 



i=i 



where 



A 1 



Ao 



E KA-^U/v 

E \(A-\^E[R 



EiiWl-WAiW'j-W^W] 



2 ] 



^ 4 3 = E. n i ax ,J( A \i 

i=1 je{l,...,d} 



C denotes a constant that depends on d, \fi = 2CA 3 A 3 and a > 1 is taken from the conditions 
on Q , defined before Theorem \1.5[ 
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3. Auxiliary results 

The quenched free-energy $ defined in ( II. lip will appear in the regression condition (12.11) . 
Lemma 3.1. 

For $ defined in (11.111) we obtain 

i|:£jtanh(<A,£i)) = I (A, - M M ) + JU(A). 

Proof. Differentiating with respect to Aj yields 

9Aj /3 cosh((A,^)) 

1 1 n 

= -(A<-M^ + -£tanh((A, £,))£). 
p n i=1 

Rearranging the equality yields the result. □ 
Moreover we consider 

1 1 n 

C?(0 := -D 2 $W(0) = ttIcW - -Ecosh- 2 ((Ar(0,^))^, 

P i=i 

with A"(£) are defined in Propostion 11.11 
Lemma 3.2. 

Lei f3 > and h > such that ((3, h) ^ (/3 C , 0). Choose an I G Z, / 7^ ; satisfying \l\ < p in the 
case of bounded p. Then there exists a constant C3 > such that 

sup C7(0 - hi - (3(1 - (x*) 2 ]Id RP < c 3v ^e n 
2eL P 

for P £- almost all £ and a// n > rii(£). 

Here || • || denotes the operator norm. The proof of Lemma [3.21 is given in [121 Lemma 3.2] 
and uses (II ,9p . Proposition 13. II and that with (jl.fip x* satisfies cosh -2 arctanhx* = 1 — (a;*) 2 . 

Using the notation 

1 P n 



^fl:=:EEfe (3.1) 

71 11=1 r=l 
1 P n 

^,0:=-EE& (3-2) 

the next lemma states an exact expression for the conditional probability that will occur in the 
linear regression condition (12. ip . 

Lemma 3.3. 

Let o~i G { — 1,1}. Then we obtain for the conditional distribution of a single spin 



fcS{-l,l} 
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and thus 



E[<r< | (<7 fc ) Mi ] = tanh(/?m>,0 + 
where E denotes the expectation with respect to P n ,/3,hei,i- 



Proof. Direct calculations yield 



-Pn„8,fte,,g({<^ = t} fl ((Tfc)fc^) 
Pn,P,heiA( a k)k^i) 



p p n p n n 

ex P i e (er ) 2 + i e e ere^i + £ e e # gv* + + /» e <fc 

A*=l M=l 2=1 jU=l fc,j'=l 3=1 




_„„pn pn n 

£ ex P g + f e e gtfck + £ e e e&jv* + ^ci^ +^e e^j 

[-1,1} L ' U=1 4 =1 . j*=l fe.J=l 3=1 




exp(/3ml((T,Ot + ^it) 



E exp(/?ml((7 ) 0A ; + ^^)' 



fce{-i,i} 



where we canceled equivalent expressions in numerator and denominator and used the expres- 
sion for m\(cr, £). Thus 



Higher order moments of the rescaled empirical spin vector of the Hopfield model, appearing 
in Theorems 11.31 up to 11.71 can be bounded as follows: 

Lemma 3.4. 

For W as in Theorems 1 1 . 51 up to [7771 obtain that for any I e N and j G {1, . . . ,p} 



Proof. First we will have to make a transformation with the well-known Hubbard- Stratonovich 
approach, expressing the distribution of S n in the Hopfield model in terms of $. This approach 
was for example used in [H Lemma 2.2] and in [7]. Let Id denote the p x p identity matrix and 
for > and h > we pick a random vector V in a way that £(V) equals a p-dimensional 
centered Gaussian vector with covariance matrix /3 _1 Id and V is chosen to be independent from 
all other random variables involved. Additionally A := A™(£) denotes the maximum point of $ 
taken from Proposition 11.11 First we note that 



E[<7< | (<7*) Mi ] = P({cTi = 1} U (<7 fc ) Mi ) - P({a; = -1} U (a fc ) Mi ) 



= exp(Pm\{a, g) + hgj) - exp(-Pm\{(T, Q - fog) 
exp^m*^, f) + fog) + exp(-/3m*(a, - fo£|) 

= tanh(/3mKa,0 + ^-). 



□ 



E Wj < const.(l). 
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1 1 

where P n (S n G dy) = T7 p(do"i) and p(d<7j) = |5_i(d<7j) + Mi(d<7j). Furthermore for -u G W we 



have 



i=i 



/ exp (—(u, y) + (y, heft) P n (S n G dy) = / exp ( t £ E + tf ft P( d ^) 

rp V™ / V 71 *t=lj=l M=lj=l / *=1 

n f f (3 \ ( n (3u \ 

= II J ex P ( -<^i» u ) + (C^i,hei) J p(d(Ti) = exp I ^logcosh(£, — + /iej) J . 

Hence, for t G R, x := and -A(ra) = y^t + nx we obtain 

p(V + v^(— -x) <t] 



= P(^nV + S n < A(n)) 

'0 



Z nSMiA J ex P I 2^'^ + ^' ^ 



p/2 







v<A(n)—y 



2^) exp^(T,,t,>JdT,P B (5 B Gdy) 



The substitution u — v + y and abbreviating C Pi „ :— Z n ^ hei ^ 
P(V + - a;) <*) 

= C p , n y exp((y,/ie/)) ^ exp f ^— (w, u) j exp f-(u,y) j duP n {S n G dy). 



u<A(n) 

The abbreviation (7p )n = C Pi „n P//2 yields 



= C p , n J exp I - — («,u) + ^logcosh^, — + /iej) J du 

a i~\ \ i=l ' / 



u<A(n) 



= C P: n J exp ( - 



z<t 
n 



£ 

2n 



{y^nz + n\ — nhei, y/nz + n\ — nhei) 



I3z 



+ l°g cos h(^, —j= + A - hei + hei) dz 



i=i 

C p , n J exp (^j= + dz, 



where we used the substitution u = ^/nz + nx for the second equality. Thus, we have 



£(^+v^(^-*)) = 4:U. f ex P 



(3.3) 
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where he{ g denotes a normalization. Applying this transformation does not change the 
finiteness of any of the moments of the W r Thus the new measure has the density ( 13. 3p . Using 
second-order multivariate Taylor expansion of <3> (see (15. ip ) and the fact that A is a maximum 
point of $ we see that the density of this new measure with respect to the Lebesgue measure 
is given by 



const, exp 



-^(y-D^(X)y) 



(up to negligible terms). With Proposition 11.11 (a) we know that for any (/3, h) ^ (/3 C; 0) the 
Hessian — D 2 $(A) is uniformly positive definite. This fact combined with the transformation of 
integrals yields that a measure with this density has moments of any finite order. □ 



4. Proofs of the Theorems 



Constructing an exchangeable pair in the Hopfield model to obtain an approximate linear 
regression property (12. ip leads us to <3> taken from (11.111) . Let (/3, h) ^ (/3 C , 0), and let x := x n (£) 
denote the unique global maximum point of see Proposition II. 1[ For fceN fixed, k < p, we 
consider 



W := y/niTk [ ./• 



1 n 



1 n t 

a 3=1 



We start by constructing an exchangeable pair. Therefore we produce a spin collection 
o' = (o",-)«>i y i a a Gibbs sampling procedure: We take J to be a random variable that is uni- 
formly distributed over {1, . . . ,n} and independent from all other random variables involved. 
Exchanging the spin Oi with o~[ drawn from the conditional distribution of the i th coordinate 
given (aj)j^i under Pn^^e,^, independently from a iy we obtain 



\ n v J \ n v 



(4.1) 



In this case (W, W) is an exchangeable pair. Let J 7 := /16N). We obtain that for 

any i — 1, . . . , k: 



Using the law of total probability for the conditional expectation and independence we have 

11™ 

Since Oi and i, j e N, are measurable with respect to T we obtain 



E[Wi — Wi\JF] 



11 1 1 JL r 

;/ // ^/n n jr^ J 1 J 
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With the help of independence and the construction of the exchangeable pair we obtain 



E 



E 



c,- pi, . . . ,a r , 



E[W[ - Wi\T] 



E[(Tj\(<Jk)k^j]- Applying Lemma [3731 yields 

ii i i n 

S n>i + — - £ Cj tanh(/3m^.(a, + 

11 1 1 n 

+ — - ]T % tanh(/3m,(a, + H]) + R hi 



n n 



n n 



3=1 



with 



1 1 ™ 



n n 



J2 Cj [tanh(/3m^(a, £) + - tanh(/3m,(a, + 



i=i 



(4.2) 



Now it is important to note that 

S 

tanh(fimj((T,€) + h^) = tanh(/3— + hei,£j). 
Thus, with Lemma 13.1} we have 



1 n 

^£jtanh(/3m i ( ( 7,£) + / i ej) 



1 /„ S, 



U 3=l 



This equation yields 



E[W<- Wi\F] 



\ n 
1 d 







(5^ + h6 u - hS u + — $ (3^ + he, 

oXi V n 



n <9A, 



<I>(/3^+/ie,) - //,. 



(4.3) 



We continue by applying (j!.12p and (15.21) (see Appendix) to the first summand in (I4.3p . Since 
A" (£) is a unique maximum point of $(A) we have ^7$(Ap(£)) = 0. We also note that - 
hSij — (A™(£))j = Thus, the first summand in (14.31) is equal to 



+ 



E 



with 

Abbreviating 
we have 



R 



v / Q2 



n 



2.i 



E 



t=k \dXid\ t 



$(Af(0) 



+ E 



r? 



i?(i) :— i?i 5 j + i?2,i 



(4.4) 
(4.5) 



E\Wl-Wi\7) = -E 



5 2 



n ^ \dXidX t 



*(A?(0) W + i2(t) 



^([l? a *(A?(0)]..,W> + i2(i 



(4.6) 



where (■, ■) denotes the Euclidean scalar product and D 2 $(AJ 1 (£)) 
of the i th row of the matrix _D 2 $(A™(£)). We obtain 



denotes the first fc entries 



E[W- W | T] = - D 2 $(X?(£)) W + R{W), 



(4.7) 
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with R(W) = (R(l), . . .,R(k)). We define A := £ -£> 2 $(AP(0) • With Proposition Ola) 

n L J |fcxfc 

— _D 2 $(A™(£)) is uniformly positive definite and thus A is invertible. We conducted the linear 
regression condition for the sigma-algebra T but it should be noted that it yields also the linear 
regression condition for the sigma-algebra generated by W since W is measurable with respect 
to J- '. In this case the linear regression condition (12.11) is fulfilled. 

Proof of Theorem \1.3[ With (14.71) we are able to apply Theorem 12. 1L Since the Hessian matrix 
of $ and (3 itself are constants we have A^ = 0(n). We continue by estimating C taken 
from Theorem 12.11 We start by giving a bound for R\ t i, defined in (14. 2p . Since the tanh(x) is 
1-Lipschitz we obtain 



\R\ 



< 



-7=- E§ tanh(/9mj(<7,0 + h^) - tanh(/3m>, f) + 
V nn j=i 

1 1 11 



nn j=i 



P 1 



E 



nnj =1 



a A*=l 



< 



ft> 1 



n n 



For the estimation of i?2,i we note that by Lemma 13.41 we have for the second part of (14.41) 



E 



E<? 

Lt=l 



1 w l w t 



n \/n \ n 



O 



p- 



n 



3/2 



For the first part of (14. 4p we note that by Lemma I3~2"} since i ^ {k + 1, . . . , p} and t G {1, . . . , k}, 

d 2 



-$(Ar(o) 



< c 3y /pe n 



dXidXt 

since this expression is a non-diagonal entry of the matrix — Cf(£). Thus we obtain that 



E 



' v 

E 

.t=k 



d 2 



dXjdX 



n 



= o 






n 



and finally 



Thus we have 



Eli?. 



2.t 







max 



Py/P^n P 2 



n 



3/2 



i=l 



max < Py/pe 



p" 



The next thing we notice is that for all i G {1, . . . , k} 

1 



< 



n 
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We easily obtain that the bound B = 0(n l / 2 ). The only thing left to do is to calculate the 
tedious conditional variance in A. We have: 

in in 

niw' i -w i ){w j -w j )\ j] = - 3 e + ^ e nct<ey r \ j 7 ] 



t,r=l 



71° 



t,r=l 



e i j] 

t,r=l 



(4.9) 



To bound the variances of these three terms we abbreviate 

1 n 1 



Thus, 



n 



t=i 



rV 



n- 
1 



m i {a)m j {a J 



rV 



< —const. max< — V 



rr 



-V 

n 



< ^^(EWWjl+nBWl). 



Using Lemma [3.41 we obtain VLAi] = 0(n 3 ). For A 2 we obtain 



1 n 



t,r=l 



Ie 

n 



1 A 



n 



n 



r=l 



Next we use the identity W[X] = E[X 2 ] — (E[X]) 2 for a random variable X and a conditional 
version of Jensen's inequality in order to obtain that V [A2] < V [A±] = 0(n~ 3 ), since a' is an 
identical copy of a. With Lemma 13.31 we get 

1 n 

U t,r=l 
1 n 

-3 E £^tanh(m^,£) + /i£l) 



rr 



t,r=l 



rr 



E ZMt ftanh(m*(a,0 + - tanh(m t (a,0 + K\) 



t,r=l 



+ ~ E £X£>nh(m t ( ( 7,£) + ^) 
: M 1 + M 2 . 



(4.10) 



Using the same estimations as for R^\i) we obtain 



Mi < 



iEe: 



r=l 



0p 



1 W j 
-PP\ 
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2 

Hence V[Mi] =0 ^ by Lemma [3.41 Additionally we get by using Lemma [3.11 (15. 2p and the 
abbreviation $( 2 ) ,lJ '(A) : = 



Mo 



+ Xi + 




d_ 

dX,, 

V 



* ( (3^ + he> 
n 



+ £ (* (2,A, w) ^ + 1 o 

+ 1 V *t> 7 + 1 



t=l 



i,f=l 



n 



Since we are estimating the variance of the expressions, constant expressions will vanish. Hence 
using Lemma I3.4I and Lemma I3.2I in the same way as for 04. 8|) we have 

V[M 2 ] = O 

Therefore V[A 3 ] can be bounded by O max j^ 11 ,^^ 
12.11 can be bounded by 9 times the maximum of the variances of Ai,A 2 ,A 3 . Consequently we 
obtain 

k 





\p 3 e 2 n p 2 ' 




max < 


n 3 ' n 3 


\] 



. Thus the variance in A of Theorem 



A= J2 V [E[(W/ - Wi)(Wj - Wj)\W] 
and this completes the proof. 



O 



max 



p 3/2 e n p 



n 



1/2 



□ 



Proof of Theorem \1.5i . Having seen the proof of Theorem 11.31 this proof gets very simple. 
We first note that Theorem 12.21 can be applied since the regression condition is the sa- 
me as for Theorem 11.31 A\ matches A taken from the same proof and thus log(n)Ai = 

f f 3/2 ~] ~\ 

O log(n) max I P n i/2 n ; n\\' Using Lemma 13.41 and the estimation of the C-term in 11.31 we ha- 

; 1 ; ; 1 1 • • /T\ Pi / \ f B 3 / 2 f^ Ti 11 mi T /ill 



. The same Lemma, A 



-7= and 



r f 3/2 ^ 

ve that the second expression is O log(n) max | ? 7^) 

A3 = 0(n) yield that the third and fourth expression have the order O '(log(n)n" 1 / 2 ) . Thus the 
theorem is proven. □ 

Proof of Theorem \l.b\ In order to prove the theorem we have to make small adjustments to 
the proof of Theorem 11.31 Using the same techniques as before we arrive at 

E [W' - W I J 7 ] = - [D 2 $(Ar(0)] W + R(W), 
with R(W) = (R(l), • • • , R(p)), where R(i) = R lti + R 2>i with R lti taken from (TOD and 

^ := ±o(inm. (4.n) 

This expression is the central difference to the proof of Theorem 11.31 Whereas the expression 
( 14.41) contained the expression 



E 

t=k 



d 2 



dXidX, 



n 



(4.12) 



which made us use Lemma 13.21 ( 14. 12ft is now part of AW since p is a constant and we do not 
need a projection to define W. Thus our expression (14.111) contains just the second expression of 
the right hand side of ( 14.4ft . Fortunately this can be estimated using Lemma l3~4l Thus, without 
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using Lemma 13.21 the computation of the rate of convergence gets a lot easier. Again it only 
remains to estimate A, B and C taken from Theorem 12.11 We note that B is the same as in 
Theorem O Thus B = 0(n~ 1 / 2 ). R X i is the same as in (14 .2p and is bounded in the same way 
as in Theorem 11.31 Since i?2,i was part of (14.41) and p is fixed we obtain by using Lemma [3.41 

E|^2,i| = 0(n~ 3/2 ). (4.13) 

In comparison to Theorem 11.31 and the bound in (14. 8p we notice that the first part of the 
maximum is not existent since the expression (I4.12p is not part of i?2,« and the second part of 
the maximum is the same as the bound in (14.131) with p constant. Using the bound on R X i and 
R-2,i we obtain C = 0(n~ l l 2 ). If we split the expectation of the expression A in the same way 
as in (14.91) and we note that A\ and A 2 are estimated in exact the same way as for the proof of 
Theorem 11.31 Finally we note that for p fixed we can also split A% as in (I4.10p and that with 
the same reasons that led to fT4TT3"D V[M X ] = V[M 2 ] = 0(n" 3 ). Hence, A = £>(n~ 1/2 ). □ 

Proof of Theorem \1.7[ The proof uses the fact that the conditional joint distribution of the 
(<7j)j, conditioned on the event j ^ — x*ei < ej, is given by 

Pn,p,t{v) = -~^—exjp(-pH n (o-,£))l B{x * eh€) (— 

where Z n> p£ denotes a normalization. Thus we are able to follow the lines of the proof of 
Theorem 11.31 □ 



5. Appendix 



For the proofs of the theorems for the Hopfield model we need a multivariate second- 
order Taylor expansion of $(A) defined in (13.11) . Let us denote by _D 2 $(A) the Hessian matrix 



{9 2 $(A)/aA^A i; z,j 



,p} of $ at A. We obtain 



= $(A) + X:^-$(A)K-A fc ) + i((u-A), J D 2 $(A)-(u-A)) 
£=i du k 2 



+ « Rt,k,j(ut~ A t )(u fe - Xk)(uj - Xj), 



(s.i; 



t,kj=i 



with 



Rt 



,k,j 



< 



d 3 



— $ . For any fixed m G {1, . . . ,p} and any A, u G W it follows that 



d 



du r 



(> $ ( A ) + Eir^— HX)(u k -X k ) 



du m ^ du k du m 

+ 0((u k -X k ){u t -X t )). 

k,t=l 



(5.2) 
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